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Abstract — Subjective  logic  (SL)  is  an  effective  tool  to  manage 
and  update  beliefs  over  a  set  of  mutually  exclusive  assertions. 
The  method  to  update  subjective  beliefs  from  direct  observations 
of  assertions  is  well  understood.  Recent  work  has  incorporated 
the  SL  framework  to  derive  the  belief  update  equations  for 
partial  observations  where  the  measurements  are  only  statistically 
related  to  the  assertions.  This  work  further  expands  the  notion  of 
SL  to  consider  uncertainty  in  the  underlying  statistical  relation¬ 
ship  between  measurements  and  assertions.  In  other  words,  new 
methods  are  derived  for  SL  that  incorporate  uncertainty  in  the 
reported  likelihood  of  the  assertions.  Simulations  demonstrate 
the  utility  of  the  new  likelihood  uncertainty  aware  belief  update 
methods. 

I.  Introduction 

This  work  investigates  methods  to  compensate  for  uncertain 
observations  in  updating  subjective  logic  (SL)  beliefs.  SL 
has  emerged  as  a  rigorous  method  to  represent  and  reason 
over  human  generated  or  automated  beliefs  in  face  of  uncer¬ 
tainty  [1].  Applications  of  SL  include  trust  management  [2], 
Bayesian  networks  [3],  and  fusion  [1],  [4].  In  short,  SL 
provides  effective  tools  to  manage  and  combine  beliefs  over  a 
set  of  mutually  exclusive  assertions  from  multiple  human  or 
computer  agents.  At  a  given  point  in  time,  an  agent’s  belief 
is  the  result  of  a  prior  belief  and  a  set  of  observations.  The 
uncertainty  of  the  belief  represents  the  reliance  on  the  prior, 
and  the  uncertainty  decreases  as  the  agent  incorporates  more 
observations  to  form  the  beliefs  over  the  set  of  assertions. 

To  our  knowledge,  the  current  SL  operations  focus  on 
fusing  beliefs  or  exploiting  belief  for  inference.  This  work 
is  concerned  about  how  to  update  subjective  beliefs  from 
observations.  Implicity,  SL  provides  the  operations  to  update 
beliefs  when  one  of  the  possible  assertions  is  completely 
visible  in  the  observation.  In  recent  work,  we  expanded  the 
notions  of  SL  to  incorporate  partial  observations  where  the 
assertions  are  only  statistically  related  to  the  observations  [5]. 
This  work  expands  [5]  by  considering  uncertainty  in  the 
knowledge  about  the  the  statistical  relationship  between  the 
observations  and  the  assertions. 
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To  make  these  notions  a  little  more  concrete,  let  us  consider 
a  motivating  example  where  one  wants  to  understand  the 
criminal  activity  within  a  city.  Specifically,  one  wants  to 
understand  if  a  crime  happens,  what  is  the  probability  that  the 
crime  occurs  in  any  one  of  the  districts.  Without  any  initial 
data,  one  might  look  at  socio-economic  factors  to  develop  an 
initial  set  of  probabilities.  Over  time,  one  can  log  where  a 
crime  occurs  and  start  to  use  these  observations  to  update  the 
probabilities.  Clearly,  as  more  observations  are  logged,  the 
certainty  associated  with  the  generated  probabilities  increases. 
SL  is  well  suited  to  infer  the  probabilities  of  a  crime  occur¬ 
ring  in  the  districts  and  the  uncertainty  associated  to  these 
probabilities. 

Now,  let  us  assume  that  one  is  interested  in  where  criminals 
live.  The  question  is  now  when  a  crime  occurs,  what  is  the 
probability  that  the  perpetrator  lives  in  a  particular  district. 
Like  before,  one  can  start  with  a  prior  set  of  probabilities  based 
upon  the  socio-economics  factors.  Furthermore,  when  a  crime 
occurs,  the  location  of  the  crime  is  readily  available  in  the 
police  report.  However,  the  identity  of  the  perpetrator  may  or 
may  never  be  discovered.  Therefore,  it  is  generally  not  possible 
to  log  where  the  perpetrator  lives.  Sometimes,  this  information 
can  be  determined  with  great  likelihood  when  the  criminal  is 
caught.  Most  likely,  one  only  incorporates  statistical  models 
that  link  the  probability  of  where  the  perpetrator  lives  con¬ 
ditioned  on  where  a  crime  occurs.  For  instance,  a  criminal 
may  not  operate  in  his/her  immediate  neighborhood  where 
he/she  can  easily  be  identified,  and  a  criminal  may  not  want 
to  venture  too  far  away  either.  This  contextual  information  can 
help  answer  the  questions  of  the  distribution  of  criminals  over 
the  various  districts  within  a  city.  This  scenario  is  an  example 
of  a  geospatial  abduction  problem  (GAP)  [6].  SL  is  suited  to 
tackle  such  applications,  but  the  notions  of  how  to  incorporate 
statistical  (and  not  just  hard)  evidence  of  the  appearance  of 
an  assertion  (the  home  district  of  a  perpetrator)  need  to  be 
developed  within  the  SL  framework. 

SL  is  a  probabilistic  logic  for  assigning  and  updating  basic 
belief  assignments  (BBA).  Classic  SL  considers  BBAs  on  a  set 
of  mutually  exclusive  singletons  [1],  [2]  to  form  an  SL  opinion. 
The  attractive  feature  of  SL  is  that  the  multinomial  opinion 
has  a  one-to-one  mapping  with  parameters  of  a  Dirichlet  dis¬ 
tribution.  Formally,  the  Dirichlet  distribution  is  the  conjugate 
prior  of  the  multinomial  distribution.  This  means  that  it  is  the 
natural  distribution  to  represent  knowledge  about  the  weights 
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associated  to  a  loaded  die  after  observing  a  number  of  dice 
rolls.  The  parameters  of  the  Dirichlet  distribution  encode  the 
results  of  the  dice  rolls.  In  essence,  a  SL  opinion  is  formed 
through  a  series  of  observations  that  equate  to  tabulating  the 
results  of  a  number  of  independent  rolls  of  the  dice. 

In  [5],  SL  was  expanded  to  handle  the  case  that  the  result 
of  the  die  roll  is  not  observed.  Rather  a  partial  observation 
is  provided  that  is  the  likelihood  of  a  given  die  roll  given 
the  features  that  are  measured  from  some  type  of  physical 
sensor  or  a  human  generated  report.  The  boldness^  that  can 
he  expressed  in  the  likelihood  is  a  function  of  the  quality  of 
information  inherent  in  the  measured  features  [7]. 

This  paper  expands  SL  further  to  accommodate  the  veracity 
of  the  likelihood  calculations  themselves.  For  instance,  the 
likelihood  calculations  could  be  the  result  of  learned  dis¬ 
tributions  from  a  labeled  training  set,  and  the  certainty  in 
reported  likelihood  values  is  a  function  of  the  number  of 
training  samples  and  the  similarity  between  the  environmental 
collections  conditions  for  the  training  and  testing  data.  Alterna¬ 
tively,  one  may  only  be  able  to  use  heuristics  to  approximate 
the  likelihood.  Finally,  the  reported  likelihoods  could  come 
from  another  source  that  may  be  intentionally  obfuscating 
the  reported  likelihoods  to  allow  for  the  consumer  to  make 
certain  inferences  but  not  allow  the  consumer  to  infer  other 
private  information  [8].  This  work  is  the  first  step  to  rigorously 
understand  how  SL  should  reason  over  uncertainty  in  the 
reported  likelihoods  for  a  given  partial  observation. 

This  paper  is  organized  as  follows.  Section  II  reviews  belief 
updates  in  SL  for  fully  visible  observations,  and  Section  Ill  re¬ 
views  the  expansion  of  SL  to  incorporate  partial  measurement. 
Then,  Section  IV  expands  SL  further  to  incorporate  uncertainty 
in  the  likelihood  values.  Simulations  demonstrate  the  potential 
utility  of  incorporating  knowledge  about  the  likelihood  uncer¬ 
tainty  in  Section  V.  Finally,  Section  VI  provides  concluding 
remarks. 


11.  Subjective  Logic 

SL  is  a  probabilistic  logic  to  represent  one’s  belief  in  a  set  of 
K  mutually  exclusive  assertions  and  the  uncertainly  in  these 
beliefs  [1].  Formally,  SL  considers  a  frame  of  K  mutually 
exclusive  singletons  by  providing  a  belief  mass  bk  for  each 
singleton  k  =  1, ...  ,K  and  providing  an  overall  uncertainty 
mass  of  u.  These  K  +1  mass  values  are  all  non-negative  and 
sum  up  to  one,  i.e., 

K 

u  +  '^b,  =  l,  (1) 

i=l 

s.t.  u  >  0  and  bi  >  0  for  i  =  1, . . . ,  K. 

SL  also  includes  a  base  rate  probability  Uk  for  each  singleton 
and  a  non-informative  prior  weight  W.  The  collection  of 
all  the  parameters  forms  the  multinomial  opinion.  The  base 
rate  values  represent  initial  (or  prior)  information  about  the 
probability  of  a  singleton  emerging  for  any  given  observation. 

*By  boldness,  we  mean  the  degree  for  which  the  likelihood  of  one  class 
(or  singleton)  is  larger  than  that  of  the  other  classes. 


The  inclusion  of  the  belief  and  uncertainty  values  along  with 
the  base  rates  and  non-informative  prior  weight  represent  the 
accrued  evidence  regarding  the  probability  of  any  singleton 
appearing  in  an  observation.  Specifically,  these  values  map 
to  a  Dirichlet  distribution  for  the  possible  probability  mass 
function  (pmf)  that  is  controlling  how  singletons  appear  in 
observations.  The  parameters  of  the  Dirichlet  distribution 
are  related  to  the  multinomial  opinion  values  via 

Wbk 

ak  = - -  +  Wak.  (2) 

u 

Likewise,  using  (1),  solving  for  bk  and  u  in  (2)  for  k  = 
1, ...  ,K,  leads  to  the  mapping  of  a  to  the  multinomial 
opinions 


w 

U  ’ 

(3a) 

XL 

bk  =  ^{ak-Wak). 

(3b) 

Note  that  a  binary  logic  is  a  special  case  known  as  binary 
opinions,  where  the  size  of  the  frame  is  AT  =  2. 

The  Dirichlet  distribution  represents  the  probability  dis¬ 
tribution  of  the  singleton  likelihood  probabilities  pk.  The 
Dirichlet  distribution  with  parameters  a.  for  the  probability 
mass  function  (pmf)  p  is 


//3(p|a) 


for  p  G  5, 
otherwise, 


(4) 


where 


B{cx) 


ntir(a.) 


is  the  multinomial  Beta  function  and  the  unit  simplex  S  = 
{plE^  _^Pi  =  1}  is  the  set  of  admissible  values  of  p.  For 
K  =  2,  the  Dirichlet  distribution  is  equivalent  to  the  beta 
distribution.  The  values  for  the  a^’s  relative  to  each  other 
are  equivalent  to  the  expected  value  of  p  for  the  Dirichlet 
distribution,  i.e., 

Pk  =  —7^ - .  (5) 

When  the  Dirichlet  distribution  represents  the  posterior,  p 
represents  the  minimum  mean  square  error  (MMSE)  estimate 
of  the  ground  truth  appearance  probabilities  given  the  obser¬ 
vations  that  form  the  beliefs.  Thus,  (2),  (3),  and  (5)  lead  to 
the  mapping  of  beliefs,  uncertainty,  and  baseline  rates  to  the 
MMSE  estimates  for  the  appearance  probabilities  as  given  by 


Pk  =  bk+  uttk-  (6) 

The  Dirichlet  distribution  peaks  near  its  mean  value  (5).^  The 
scaling  of  the  Dirichlet  parameters, 

K 

s  =  '^ai,  (7) 

i=l 


boldface  term  a?  is  a  vector  whose  fc-th  element  is  x^. 

^As  the  Dirichlet  precision  increases  to  infinity,  the  peak  and  mean  become 
arbitrarily  close  to  each  other. 
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represents  the  “spread”  or  variance  of  the  Dirichlet  distribution 
around  its  peak.  Equivalently,  it  represents  the  strength  in  the 
confidence  of  the  mean  (or  the  MMSE  estimate)  to  charac¬ 
terize  the  actual  ground  truth  for  p.  This  value  is  commonly 
referred  to  as  the  precision  parameter.  As  s  increases,  the  peak 
becomes  higher  and  narrower.  In  the  limit,  as  s  — oo,  the 
Dirichlet  converges  to  a  Dirac  delta  function.  Clearly  by  (3a), 
the  precision  value  is  inversely  proportional  to  uncertainty. 

The  fusion  of  two  subjective  opinions  consists  of  mapping 
opinions  into  the  Dirichlet  parameters,  summing  up  the  param¬ 
eters  while  taking  into  account  not  to  double-count  the  baseline 
rates,  and  then  mapping  back  into  the  multinomial  opinion 
space  [2].  This  method  for  fusing  implies  that  subjective 
opinions  are  formed  by  observations  that  increment  the  Dirich¬ 
let  parameters  so  that  fused  opinions  account  for  all  these 
observation  increments.  When  the  observation  is  the  singleton 
that  appears,  the  updates  in  SL  are  clear.  Since  the  singleton 
appearance  is  drawn  from  the  multinomial  distribution,  and  the 
current  belief  is  represented  by  a  Dirichlet  distribution,  which 
is  the  conjugate  prior  of  the  multinomial,  the  posterior  is  also 
Dirichlet.  When  the  fc-th  singleton  is  observed  to  appear,  then 
the  parameters  for  the  updated  Dirichlet  distribution  is  known 
to  be 

«■’■=«  +  efc,  (8) 

where  is  the  indicator  vector  whose  /c-th  element  is  one  and 
whose  other  elements  are  all  zero.  Then  a+  can  be  inserted 
into  (3)  to  obtain  the  updated  beliefs  and  Overall, 
a  multinomial  opinion  is  formed  by  simply  counting  the 
occurrences  of  singletons  to  maintain  the  Dirichlet  parameters, 
and  equivalently,  the  multinomial  opinion  values.  Typically, 
the  prior  weight  W  =  2.  It  represents  the  strength  of  the  prior 
in  influencing  updated  beliefs  relative  to  the  observation. 

The  many  operations  that  exist  in  SL  for  multinomial  or  just 
for  binary  opinions  are  not  completely  amenable  to  a  mapping 
to  the  Dirichlet  distribution  in  the  sense  of  fusion  and  updates 
from  observations.  One  example  is  the  “and”  or  multiplication 
operation  for  binary  opinions  [9].  SL  is  a  tractable  framework, 
but  it  approximates  belief  propagation  via  parameters  of  a 
Dirichlet  distribution.  Eor  any  operation  in  SL,  the  operands 
are  assumed  to  follow  the  Dirichlet  distribution.  A  Dirichlet 
distribution  is  htted  to  the  output  of  the  operation  in  a  manner 
that  preserves  the  mean.  However,  to  maintain  the  properties 
of  SL,  the  variance  is  approximated.  In  essence,  the  values  of 
ttfc’s  relative  to  each  other  are  maintained.  However,  the  sum 
of  the  ttfc’s  is  approximated.  By  (3a),  the  Dirichlet  precision 
is  inversely  proportional  to  the  uncertainty.  These  principles 
for  handling  mathematical  operation  in  SL  are  used  in  the 
next  two  sections  to  add  the  measurement  likelihood  update 
operation  into  the  SL  framework. 


a  feature  vector  x  G  An  observer  transforms  the  feature 
vector  in  a  likelihood  vector  that  represents  how  likely  the 
feature  vector  was  caused  by  the  occurrence  of  each  of  the 
singletons  (or  classes)  in  the  frame.  Either  the  observer  does 
this  by  experience  or  it  employs  assistance  from  computational 
classifiers  that  learn  how  to  represent  the  likelihood  from  a 
set  of  training  data.  In  either  case,  one  assumes  that  in  this 
section,  the  observer  or  classifier  is  able  to  determine  the 
correct  likelihoods  for  each  class,  which  is  simply  the  value 
of  the  conditional  distributions  corresponding  to  the  feature 
vector  X,  i.e., 

k  =  f{x\z  =  i),  (9) 

where  f{-\z  =  i)  is  the  probability  density  function  (pdf)  for 
the  measured  features  conditioned  on  the  appearance  of  the  i- 
th  class  where  1  <  i  <  A".  In  [5],  the  SL  belief  update  method 
for  partial  observations  that  are  reported  as  class  likelihoods 
was  developed.  This  section  reviews  that  development. 

A.  Naive  Belief  Update 

The  naive  approach  for  the  partial  observation  update  is 
to  spread  the  mass  of  the  Dirichlet  update  in  (8)  via  the 
normalized  likelihood 


h 

1  ' 

(10) 

■  <Tfe  +  Ik- 

(11) 

Lor  the  case  of  a  visible  update  where  the  value  of  z  is  known, 
i.e.,  I  =  Ei,  then  (11)  simplifies  to  (8).  While  this  naive 
approach  can  be  viewed  as  a  generalization  of  the  visible 
observation  update,  it  does  not  yield  a  posterior  Dirichlet 
distribution  that  is  a  good  fit  to  the  actual  posterior  distribution 
of  the  observation  probabilities  p. 

B.  Likelihood  Update  to  Approximate  the  Posterior 

The  likelihood  update  determines  the  posterior  observation 
probabilities  p  given  the  current  subjective  opinion  and  mea¬ 
surement.  Then  one  fits  a  Dirichlet  density  to  the  posterior 
in  order  to  approximate  the  updated  subjective  opinion.  This 
derivation  starts  with  the  joint  pdf  of  the  feature,  i.e.,  the  partial 
observation,  and  the  observation  probabilities  conditioned  on 
the  current  multinomial  opinion,  which  is 

/(x,2  =  i,p|a)  =  /(x|z  =  i)prob(z  =  i|p)//3(p|a), 

=  kp^fp{p\oL)■  (12) 

Then  marginalization  to  remove  the  hidden  variable  z  leads  to 
/(x,p|a)  =  //3(p|a),  (13) 


HI.  Measurement  Updates  for  SL 

Usually,  it  is  not  possible  to  update  beliefs  in  singletons 
by  directly  observing  the  singletons  in  an  event.  Rather,  a 
measurement  of  the  event  is  made  that  is  statistically  related 
to  the  occurrence  of  the  singleton.  The  measurement  forms 


so  that  the  posterior  for  the  observation  probabilities  after  the 
measurement  update  is 


K 


/(P|a,x)  = 


//3(p|a) 


(14) 
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Note  that  (14)  is  invariant  to  the  scaling  of  the  likelihood. 
When  I  =  Bk,  i.e,  the  likelihood  is  zero  for  all  classes 
except  for  the  k-ih  class,  then  (14)  simplifies  to  the  Dirichlet 
distribution  /^(p|a  +  6^),  which  means  that  the  observation 
of  the  target  class  is  fully  observable.  On  the  other  hand,  when 
all  classes  have  equal  likelihoods,  i.e.,  I  =  1,“*  (14)  simplifies 
to  /^(pja),  which  means  the  updated  beliefs  are  equivalent 
to  the  previous  belief.  In  other  words,  the  measurement  is 
vacuous  for  the  case  of  equal  likelihoods.  Clearly,  the  naive 
approach  given  by  (1 1)  is  not  properly  updating  beliefs  for  the 
vacuous  case. 

The  next  step  is  to  approximate  the  posterior  by  the  Dirichlet 
distribution.  The  following  theorem  helps  to  determine  the  best 
Dirichlet  approximation  of  the  posterior. 


Theorem  1.  For  a  Dirichlet  distribution  and  the  posterior 
distribution  given  in  (14)  to  exhibit  identical  first  order  mo¬ 
ments  and  to  to  exhibit  an  identical  second  order  moment  for 
the  marginal  of  the  k-th  element,  then  the  parameters  of  that 
Dirichlet  distribution  are  given  by 


fife  =  Sfe  - 


Oik 


Oikh. 


I  i  Oc  7 


1  ^3 


for  k  =  1, . . . ,  K,  where  the  precision  is 


st  = 


1  +  “i 


1  + 


^  +  (l+afc)C 


(15) 


(16) 


oik 


i^k 


and 


Otk. 

i^k 


(17) 


Proof:  See  Appendices  A  and  B  in  [5].  ■ 

Note  that  ctk  and  Ik  represent  the  total  Dirichlet  precision 
and  average  likelihood,  respectively,  associated  to  the  com¬ 
plement  of  the  fc-th  singleton  in  the  frame.  When  AT  =  2,  it 
is  easy  to  verify  that  =  sj,  because  li  =  l^,  oi  =  02, 
I2  =  h,  and  a2  =  «!•  In  general,  for  k  j,  and 

it  not  possible  to  select  the  precision  so  that  all  the  second 
order  moments  of  the  Dirichlet  approximation  match  those  of 
the  posterior.  In  any  event,  the  larger  the  updated  precision, 
the  larger  the  updated  Dirichlet  parameters  (see  (7)). 

As  shown  in  [5],  selection  of  even  the  largest  value  of  as 
the  updated  precision  will  usually  lead  to  the  fact  that  one  or 
more  of  the  updated  parameters  actually  decrease  in  value.  At 
best,  the  smallest  change  in  the  updated  parameters  is  zero. 
Before  the  update  of  too  many  observations,  it  is  possible 
that  the  decrease  in  one  of  the  parameter  values  leads  to  a 
negative  subjective  belief  (see  (3b)).  In  [5],  this  issue  was 
avoided  by  selecting  the  precision  to  be  large  enough  to  avoid 
any  decrease  in  the  updated  Dirichlet  parameters.  However, 
we  have  since  discovered  that  a  larger  than  necessary  precision 
unnecessarily  thwarts  the  influence  of  new  observations.  Better 
performance  is  obtained  by  only  increasing  the  magnitude  of 


is  the  vector  whose  elments  are  all  one. 


the  precision  to  avoid  negative  subjective  beliefs.  Thus,  the 
updated  Dirichlet  parameters  are 


=  s~^mk, 

(1 8a) 

rrik 

1  o^k.lk 

i  +  Ef=i«.  ’ 

(18b) 
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f  1  -r  1 
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where  is  given  by  (16).  Algorithm  1  summarizes  the 
likelihood  update  process  for  SL  multinomial  opinions. 


Algorithm  1  Likelihood  update  for  SL 
Input:  SL  Multnomial  opinion  and  likelihood  I 
Output:  Updated  SL  multinomial  opinion 

1)  Transform  multinomial  opinion  into  Dirichlet  parame¬ 
ters  via  (2). 

2)  Update  Dirichlet  parameters  via  (16)  and  (18). 

3)  Convert  updated  Dirichlet  parameters  into  updated 
multinomial  belief  via  (3). 


It  is  shown  in  [5]  that  the  increase  in  precision  is  bounded, 
i.e.,  0  <  s’*"  —  s  <  1.  The  lower  bound  occurs  for  the 
vacuous  case  where  I  =  1.  In  essence,  likelihood  provides 
no  information,  and  by  (18),  the  subjective  belief  remains 
unchanged,  i.e.,  =  a.  The  upper  bound  occurs  for  the  fully 

observable  case  when  I  =  Sk  where  the  likelihood  means  the 
the  feature  values  indicate  without  a  doubt  that  the  fc-th  class 
was  observed.  Then,  the  fc-th  Dirichlet  parameter  increments 
by  a  one  while  other  parameters  remain  the  same.  These  two 
cases  represent  the  largest  and  smallest  entropy  values  for 
the  likelihood.  As  demonstrated  in  [5],  as  the  entropy  of  the 
likelihood  decreases,  i.e.,  the  likelihood  is  bolder  in  espousing 
a  given  class,  the  corresponding  observation  is  increasing  the 
precision  of  the  updated  Dirichlet  parameters  and  lowering  the 
uncertainty  of  the  updated  subjective  belief. 

IV.  Uncertain  Likelihood  Updates 

This  sections  considers  how  to  modify  the  belief  update 
in  (18)  when  the  likelihood  is  reported  with  some  degree 
of  uncertainty.  In  practice,  the  likelihood  calculated  by  an 
observer  is  not  completely  accurate  as  discussed  in  the  in¬ 
troduction.  For  the  remainder  of  the  paper,  it  is  convenient  to 
collapse  the  likelihood  onto  the  unit  simplex  S.  In  (18),  the 
likelihood  I  encodes  the  observation  x  as  a  AT-dimensional 
vector.  In  fact,  the  magnitude  of  the  likelihood  vector  provides 
no  information  as  (18)  is  invariant  to  scalings  of  1.  Thus, 
a  partial  observation  can  be  summarized  as  the  normalized 
likelihood  vector  I  (see  (10)).  As  long  as  the  normalized 
likelihood  is  correctly  calculated,  one’s  belief  can  be  properly 
updated.  For  the  remainder  of  the  paper,  we  will  use  the  term 
likelihood  to  actually  refer  to  the  normalized  likelihood  for 
the  sake  of  brevity.  The  assumption  made  in  this  section  is 
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that  the  distribution  of  the  reported  likelihood  Ir  conditioned 
on  the  latent  actual  likelihood  I,  i.e.,  f{lr\l),  is  known,  and 
uncertainty  is  expressed  as  the  “spread”  of  this  distribution. 
Furthermore,  it  is  assumed  that  the  reported  likelihood  con¬ 
ditioned  on  the  actual  likelihood  is  independent  of  the  class 
appearance  z  and  the  appearance  probabilities  p,  i.e., 

f{ir\i,Z,p)  =  f{ir\i).  (19) 

Given  this  distribution,  this  section  outlines  how  to  compute 
the  posterior  distribution  of  the  appearance  probabilities  given 
the  reported  likelihood. 

To  compute  this  posterior  distribution,  the  following  theo¬ 
rem  is  useful. 

Theorem  2.  The  distribution  of  the  true  likelihood  conditioned 
on  the  latent  class  appearance  satisfies  the  following  identity 

f{i\z  =  i)=  g{i)Ii, 

where 

K 

9{i)  =  =  i)- 

i=l 


updating  his/her  belief  must  understand  the  distributions  of 
the  features  used  by  the  classifier,  i.e.,  x  conditioned  on  the 
appearance  states  z.  In  most  cases,  the  observer  will  not 
have  a  more  accurate  understanding  of  these  distributions  as 
the  classifier.  Therefore,  in  most  cases  g{l)  is  unknown  to 
the  observer.  It  is  reasonable  to  consider  the  non-informative 
multiplier  g{l)  =  1  when  the  observer  does  not  have  access 
to  the  phenomenology  of  how  the  features  x  relate  to  z.  In 
the  remainder  of  the  paper,  the  use  of  the  correct  value  for 
g{l)  means  that  the  feature/class  likelihood  model  is  employed. 
When  the  likelihood  model  is  not  utilized,  g{l)  is  set  to  one. 
As  will  be  seen  in  the  simulations  section,  the  shape  of  g{l) 
changes  greatly  if  the  classes  are  well  separated  or  not  in  the 
feature  space. 

The  observable  is  actually  the  reported  likelihood,  which 
statistically  depends  only  on  the  true  likelihood  (see  (19)). 
Likewise,  the  true  likelihood  depends  only  on  the  class  appear¬ 
ance  z,  and  the  class  appearance  depends  on  the  appearance 
probability.  In  short,  the  joint  probability  of  all  quantities  is 

f{ir,  i,  Z,p\a)  =  f{ir\i)g{i)hpzf0ip\ot).  (20) 


Proof:  Consider  the  set  of  observations  whose  true  like¬ 
lihood  is  I,  i.e.. 


x{i)  ■  Y.ili l(x\l=k)  for  z  i,...,Ar|. 


Now, 


f{i\z  =  z)  =  /  /(a;|z  =  i)dx, 

Jxii) 


f  ('^fi^\z  =  k)]kdx, 
Vfe=i  / 

h  'Y]  /  fi^\^  =  k)dx, 

Jx(i) 


k=i9x(i) 

hg{i)- 


The  theorem  states  that  the  distribution  of  the  likelihood 
given  a  specific  class  appearance  is  simply  a  scaled  version 
of  the  likelihood  of  that  class.  If  the  the  observable  is  the 
likelihood  and  not  the  observation  x,  then  f{l\z  =  z)  replaces 
f{x\z  =  i)  in  (12).  In  essence,  the  theorem  states  that  the 
posterior  distribution  of  p  conditioned  on  the  likelihood  is 
exactly  the  same  as  the  posterior  distribution  conditioned  on 
x  since  f{l\z)  is  simply  scaled  version  of  the  likelihood.  In 
other  words,  the  likelihood  contains  the  same  discriminative 
information  as  x. 

The  scalar  multiplier  g{l)  in  inconsequential  when  the  re¬ 
ported  likelihood  is  known  to  be  the  true  likelihood.  However, 
for  the  more  practical  and  general  case  that  the  observed 
reported  likelihood  is  only  statistically  dependent  on  the  true 
likelihood,  then  this  multiplier  term  does  affect  the  posterior 
distribution  as  will  be  seen.  This  multiplier  encodes  the  phe¬ 
nomenology  of  how  the  likelihood  is  generated.  If  the  belief 
update  is  based  upon  the  output  of  a  classifier,  the  observer 


Through  marginalization  of  the  latent  terms  I  and  z,  the 
posterior  distribution  of  p  given  the  reported  likelihood  is 
proportional  to 


where 

I  =  fiir\z)  =  [  f{ir\i)9{i)hdi  (22) 

Js 

is  the  class  likelihood  of  the  reported  likelihood.  Comparing 
(13)  with  (21),  it  clear  that  I  replaces  the  role  of  L  In  other 
words,  the  computation  of  the  belief  updates  for  a  reported 
likelihood  constitutes  the  transformation  of  the  reported  like¬ 
lihood  via  (22)  into  an  effective  likelihood  for  insertion  into 
the  update  equations. 

To  illustrate  the  general  report  model,  we  consider  models 
for  f{lr\l)  that  simply  represent  that  the  reported  likelihood 
is  an  estimate  of  the  actual  likelihood  within  a  given  level  of 
uncertainty  Ur-  The  uncertainty  value  Ur  is  a  number  between 
zero  and  one.  Specifically,  the  reported  likelihood  is  treated 
as  a  sample  from  the  Dirichlet  distribution  with  parameters 
—1.  Thus,  the  conditional  mean  is  the  true  likelihood.  As 

Ur 

Ur  — ?>  0,  the  conditional  distribution  approaches  a  Dirac  delta 
function  centered  on  the  true  likelihood.  As  Ur  increases,  the 
“spread”  of  the  distribution  increases.  The  transformation  of 
the  reported  likelihood  to  the  effective  likelihood  requires  a 
K-l  dimensional  integration  in  (22).  At  this  time,  we  do 
not  know  if  the  employment  of  a  Dirichlet  form  for  the 
conditional  distribution  affords  a  closed  form  formula  for 
the  transformation  that  circumvents  the  need  for  numerical 
integration. 

In  this  paper,  the  transformation  in  (22)  is  accomplished  via 
numerical  integration.  Thus,  the  paper  investigates  the  binary 
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K  =  2  case.  Then  the  normalized  effective  likelihood  is  given 
hy 


I  = 


hljlr) 
ha(lr\ 

,  _  h^dr)  ’ 
hadr)  . 


(23) 


where 


hl(lr)=fo 


O' 


.(l-y)-l 


-9*{y)ydy,  (24a) 


K{ir)  =  fo 


~  k’(i-y)]T) - 9*{y)dy,  and(24h) 

9*{y)  =  9i[yA-yf)-  (24c) 


Algorithm  2  summarizes  the  process  to  update  an  SL 
multinomial  opinion  using  an  uncertain  reported  likelihood. 


Algorithm  2  Uncertain  likelihood  update  for  SL 

Input:  SL  multinomial  opinion,  reported  likelihood  Ir,  and 

feature/class  likelihood  model  g{l) 

Output:  Updated  SL  multinomial  opinion 

1)  Normalize  reported  likelihood  via  (10). 

2)  Transform  normalized  reported  likelihood  into  effective 
likelihood  via  (22)  using  numerical  integration. 

3)  Update  SL  opinion  via  Algorithm  1  using  the  effective 
likelihood. 


V.  Simulations 

This  section  uses  simulations  to  compare  various  methods 
to  update  SL  beliefs  to  demonstrate  the  utility  of  compensating 
for  the  uncertainty  associated  to  the  reported  likelihoods.  The 
simulations  consider  a  simple  exemplar  case  where  the  features 
are  x  G  For  each  observation,  the  appearance  of  a  given 
class  is  drawn  from  a  multinomial  distribution  p.  Given  the 
appearance  of  the  fc-th  singleton,  the  feature  vector  represents 
a  sample  from  a  Gaussian  distribution  with  mean  and 
covariance  i.e., 

“  (72^  «p  I  -  A||„  _  I .  (25) 

Once  the  feature  vector  is  generated,  the  class  likelihood  is 
computed  via  (9)  and  normalized  to  generate  the  ground- 
truth  values  1.  Then,  the  reported  likelihood  is  drawn  from 
a  Dirichlet  distribution  with  parameters  —  Z  to  simulate  the 
output  of  the  classifier.  Note  that  in  the  feature  space,  the 
distances  between  the  clusters  representing  the  K  classes,  e.g., 
Mahalanobis  distance  between  class  centroids,  are  all  equal. 
In  the  simulations,  the  number  of  classes  is  AT  =  2. 

The  parameter  controls  the  separability  of  the  classes 
in  the  feature  space.  When  is  small,  the  classes  are  well 
separated.  This  means  that  the  true  likelihoods  are  usually 
bold,  i.e.,  only  one  element  of  I  is  large.  As  grows,  the 
class  separability  decreases.  Given  the  model  used  in  these 
simulations,  the  g{l)  term  can  be  calculated  analytically  (see 


Figure  I.  The  estimated  g{l)  distributions  and  polynomial  fits  to  those 
estimates  when  the  class  spreads  are  (a)  cr^  =  0.25,  (b)  =  1,  and  (c) 

(T^  =  2.25. 


proof  of  Theorem  2),  but  this  calculations  is  difficult.  Instead, 
the  term  is  computed  numerically  via  Monte  Carlo  simulations 
where  10,000  simulated  true  likelihoods  are  generated  when 
the  class  appearance  probabilities  are  all  equal.  The  histogram 
of  likelihood  values  is  used  to  estimate  g{l).  Figure  1  shows 
the  estimated  g{l)  for  cr^  =  0.25, 1,  and  2.25,  respectively.  The 
figure  also  plots  polynomial  fits  to  the  estimated  g{l).  The 
polynomial  fits  represent  the  feature/class  likelihood  models 
that  are  used  in  the  numerical  integration  of  (24).  Since  K  —  2, 
the  likelihoods  are  uniquely  represented  by  the  first  element 
1 1  since  I  =  [Zi,  1  —  li]’^ .  It  is  clear  in  the  figure  that  as  the 
the  class  separability  decreases,  the  mass  in  the  histograms 
migrate  from  the  edges,  i.e.,  bold  likelihoods,  to  the  center, 
i.e.,  vacuous  likelihoods.  Clearly,  the  shape  of  g{l)  is  sensitive 
to  the  class  separability  in  the  feature  space.  The  polynomial 
approximations  to  the  histograms  are  used  in  the  likelihood 
transformation  calculations  in  the  remainder  of  this  section 
when  the  feature/class  likelihood  models  are  employed. 

The  class  separability  affects  the  transformation  of  the 
reported  likelihood  to  the  effective  likelihood  in  (22)  (and 
(24)  for  K  =  2).  Figure  2(a)  plots  this  transformation  of 
the  reported  Z^  =  [.9,  .1]^  over  the  full  ranges  of  likelihood 
uncertainties  Ur  for  the  three  class  separability  models.  The 
figure  also  plots  the  transformation  when  the  model  is  ignored. 
For  all  cases,  as  uncertainty  increases,  the  entropy  of  the 
effective  likelihood  vector  increases.  As  demonstrated  in  [5], 
an  increase  in  entropy  of  the  likelihood  vector  leads  to  lower 
increase  in  the  Dirichlet  strength  (or  SL  uncertainty)  in  the 
updated  belief.  Figure  2(b)  shows  the  updated  SL  uncertainty 
u  when  the  prior  SL  belief  is  61  =  62  =  0.45,  it  =  0.1 
with  an  uniform  baseline  of  oi  =  02  =  0.5.  Clearly,  as 
the  entropy  of  the  effective  likelihood  increases  (or  its  largest 
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Figure  2.  The  effectiveness  of  a  likelihood  report  of  Ir  =  [0.9,0.!]^ 
when  employing  and  not  using  the  feature/class  model  for  vaiious  values  of 
likelihood  uncertainty  Ur-  (a)  Effective  likelihood  values,  and  (b)  updated  SL 
uncertainty  when  prior  SL  opinion  is  [bi  =  62  =  0.45,  u  =  0.1,  ai  =  02  = 
0.5]. 


element  decreases),  the  reduction  in  entropy  from  an  initial 
value  of  0.1  decreases.  When  the  class  separability  is  large, 
the  increase  in  entropy  of  the  effective  likelihood  as  likelihood 
uncertainty  increases  is  slower  than  when  class  separability  is 
smaller.  This  means  that  for  a  given  likelihood  uncertainty, 
the  larger  class  separability  model  creates  a  larger  decrease 
in  the  updated  SL  uncertainty  as  shown  in  Figure  2(b).  It  is 
interesting  to  note  that  ignoring  the  feature/class  separability 
model,  i.e.,  assume  g{l)  =  1,  provides  similar  behavior  in 
the  likelihood  transformation  as  for  the  moderate  separability 
model  of  cr^  =  1. 

The  simulations  compare  five  different  possible  SL  belief 
update  methods.  First,  the  clairvoyant  method  uses  the  actual 
likelihood  to  perform  the  update  via  Algorithm  1 .  Because  the 
true  likelihood  is  used,  this  method  serves  as  the  unrealizable 
gold  standard.  The  other  methods  process  the  reported  likeli¬ 
hoods.  The  uncertainty  compensation  with  likelihood  model 
method  performs  Algorithm  2  using  the  true  (polynomial 
approximation)  of  g{l).  Likewise,  the  uncertainty  compensa¬ 
tion  without  the  likelihood  model  performs  Algorithm  2  with 
g{l)  =  1.  The  standard  method  from  [5],  i.e.,  Algorithm  1, 
assumes  certainty  by  treating  the  reported  likelihood  as  the  true 
likelihood.  Finally,  the  naive  method  also  assumes  certainty 
but  performs  the  update  via  (11).  These  five  methods  were 
evaluated  over  each  of  three  feature/class  models  and  over 
three  levels  of  reported  likelihood  uncertainty  (ur  =  0.01,  0.1, 
or  1).  For  each  of  the  nine  feature/reported  likelihood  model 
combinations,  the  SL  beliefs  are  updated  one  likelihood  report 
at  a  time  in  the  order  of  the  received  reports  over  300  obser¬ 
vations.  Initially,  the  beliefs  b  are  zero  so  that  a  =  [1, 1]^, 
and  the  process  is  repeated  over  1000  independent  traces.  For 
all  these  simulations  the  class  appearance  probabilities  are 
p  =  [2/3, 1/3]^. 

The  performance  results  averaged  over  the  1000  indepen¬ 
dent  traces  are  provide  in  Figure  3.  The  figure  shows  plots  of 
the  expected  appearance  probability  for  the  first  class  appear¬ 
ance  Pi  (see  (6))  as  a  function  of  the  number  of  partial  obser¬ 
vations  that  have  been  integrated.  Note  that  since  K  =  2,  the 
relative  results  for  the  second  class  appearance  probability  are 
exactly  the  same.  As  one  would  expect,  the  clairvoyant  method 


always  converges  close  to  the  true  appearance  probability  of 
Pi  =  2/3.  The  naive  method  never  works,  and  the  standard 
method  (no  uncertainty  compensation)  only  converges  to  the 
true  appearance  probability  when  the  likelihood  uncertainty  is 
small,  i.e.,  Ur  =  0.01.  In  fact,  when  the  likelihood  is  small, 
all  the  methods  but  the  naive  work  just  about  equally  well. 
This  is  because  the  reported  likelihood  is  always  very  close  to 
the  true  value.  Usually,  the  standard  method  does  not  perform 
as  well  as  the  uncertainty  compensation  methods  for  larger 
uncertainty. 

Uncertainty  compensation  using  the  correct  feature/class 
separability  is  almost  as  effective  as  the  clairvoyant  method. 
Effectively,  its  performance  tracks  the  performance  of  the 
clairvoyant  method  expect  when  the  class  separability  is 
poor,  i.e.,  =  2.25  and  the  likelihood  uncertainly  is  high, 

i.e.,  Ur  =  1.  Even  in  that  case,  the  feature/class  model 
based  method  is  able  to  converge  to  the  correct  solution, 
but  at  a  slower  rate  that  the  clairvoyant  method.  The  slower 
convergence  is  due  to  the  deep  discounting  of  the  boldness 
of  the  reported  likelihood  in  the  likelihood  transformation 
(see  Figure  2).  Uncertainty  compensation  does  not  uniformly 
perform  as  well  when  it  ignores  the  feautre/class  model.  It  is 
usually  better  than  no  compensation  at  all.  For  moderate  class 
separation,  i.e.,  =  1,  its  performances  with  and  without 

the  model  are  about  the  same  as  expected  from  Figure  2.  For 
good  class  separation,  i.e.,  cr^  =  0.25,  the  no  model  method 
overcompensates  the  effective  likelihood  relative  to  the  model 
method,  and  its  expected  appearance  probability  converges 
to  a  value  slightly  larger  than  pi  —  2/3.  On  the  other 
hand,  when  the  class  separation  is  poor,  i.e.,  =  2.25,  the 

no  model  method  undercompensates  the  effective  likelihood, 
and  the  expected  appearance  probability  converges  to  a  value 
slightly  lower  than  pi  =  2/3.  The  undercompensation  when 
ignoring  the  model  is  better  than  no  compensation  at  all  as  the 
standard  methods  underestimates  the  probability  more  than  the 
no  model  uncertainty  method. 

VI.  Conclusions 

The  paper  extended  SL  to  incorporate  partial  observations 
when  the  reported  class  likelihoods  are  considered  uncertain. 
The  extensions  is  based  upon  approximating  the  results  of 
the  Bayesian  update.  Specifically,  the  current  subjective  belief 
corresponds  to  an  equivalent  Dirichlet  distribution  for  class  ap¬ 
pearance  probabilities.  The  reported  class  likelihood  is  treated 
as  the  observation,  and  the  exact  form  for  the  posterior  dis¬ 
tribution  is  determined  for  the  class  appearance  probabilities. 
Finally,  the  updated  SL  opinion  corresponds  to  the  parameters 
of  a  Dirichlet  that  best  approximates  the  posterior  by  capturing 
the  same  same  mean  values  while  approximating  the  variances. 
This  work  build  upon  earlier  research  that  determined  how  true 
class  likelihood  values  should  update  the  SL  opinions.  It  turns 
out  that  considering  the  uncertainty  of  the  reported  likelihoods 
is  equivalent  to  transforming  the  reported  likelihoods  into  an 
effective  likelihood,  which  then  updates  the  SL  opinions  by 
the  true  likelihood  update  equations  from  the  earlier  work. 
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Figure  3.  Expected  appearance  probability  pi  when  the  true  probability 
is  pi  =  2/3  versus  number  of  partial  observations  for  various  SL  update 
methods  over  three  feature/class  models  and  three  reported  likelihood  un¬ 
certainty  models:  (a)  cP'  =  0.25,  ttr  =  0.01,  (b)  =  0.25,  Ur  =  0.1, 

(C)  0-2  =  0.25,  Ur  =  1,  (d)  0-2  =  l,Ur  =  0.01,  (e)  cr^  =  1,  Ur  =  0.1, 
(f)  0-2  ^  i  ^  (g)  ^2  ^  2.25,  tir  =  0.01,  (h)  0-2  =  2.25,  Ur  =  0.1, 

and  (i)  cr^  =  2.25,  Ur  =  1. 


Simulations  demonstrated  the  effectiveness  of  the  uncer¬ 
tainty  aware  update  method.  Actually,  the  uncertainty  aware 
method  requires  model  knowledge  of  how  the  likelihoods 
are  generated  from  the  actual  appearances  of  the  classes.  In 
other  words,  the  exact  method  requires  knowledge  of  the 
statistical  distribution  of  the  observed  features  conditioned  on 
the  latent  class  appearance  to  determine  the  typical  distribution 
of  likelihood  values.  Another  uncertainty  aware  method  is 
proposed  that  ignores  this  feature/class  model.  The  simulations 
show  that  ignoring  the  model  can  affect  performance  in 
terms  of  inferring  the  underlying  appearance  probabilities. 
However,  even  when  ignoring  this  model,  uncertainty  aware 
compensations  is  usually  better  and  never  worse  than  taking 
the  reported  likelihood  values  as  truth. 

It  would  be  interesting  to  determine  if  some  rudimentary 
knowledge  about  the  class  separability  afforded  by  the  features 
can  help  in  the  transformation  of  the  reported  likelihood 
into  the  effective  likelihood.  In  effect,  knowledge  of  good 
class  separability  should  simply  retard  the  increase  in  the 
entropy  of  the  effective  likelihood  as  likelihood  uncertainty 
increases.  Conversely,  knowledge  of  poor  class  separability 
should  accelerate  the  increase  in  the  entropy  of  the  effective 
likelihood  as  likelihood  uncertainty  increases.  Furthermore,  the 
robustness  of  the  uncertainty  aware  updates  should  be  studied 
in  terms  of  how  well  they  maintain  performance  when  the 
assumed  model  that  generates  the  reported  likelihood  model 
does  not  match  the  actual  model.  Future  work  also  needs 
to  consider  more  efficient  processing  methods  that  avoids 
a  K  —  1-dimensional  numerical  integration.  Finally,  we  are 
interested  in  understanding  how  intentional  obfuscation  of  the 
true  likelihood  can  control  the  inferencing  performance  of  SL. 
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